-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(python): Add security warning in LazyFrame.deserialize() docstring #15282
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the text looks good. One note about formatting.
I think this also applies to Expr.deserialize
, right?
py-polars/polars/lazyframe/frame.py
Outdated
.. warning:: | ||
This function uses :mod:`pickle` under some circumstances, and as | ||
such inherits the security implications. Deserializing can execute | ||
arbitrary code so it should only be attempted on trusted data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this as a Warnings
section rather than an admonition? The section should be below Parameters / above See Also. See https://numpydoc.readthedocs.io/en/latest/format.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should elaborate here. If we put a big fat security warning we should explain under which conditions. Polars only uses pickle if the logical plan was serialized with python UDFs. In most queries Polars won't use pickle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritchie46 is that relevant for deserialize() though ? From the security perspective, what matters is whether you trust the data. If so, it does not matter whether it's using pickle. However, if you don't trust the data, it's very unlikely you would trust a promise that this JSON does not contain the UDF. As such, if the warning applies to your case, you'd have to assume it contains a malicious bit of pickle. So unless there is a way for the user to prevent loading pickle at the loss of some feature (e.g. a deserialize(..., forbid_pickle=True)
parameter), the conditions of production don't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gives you more information on when we call pickle, I think that's important. You can evenly verify that the JSON doesn't have any UDFs as it is human readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for what it's worth, I can see us adding a forbid pickle.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #15282 +/- ##
==========================================
- Coverage 81.32% 81.32% -0.01%
==========================================
Files 1359 1359
Lines 176072 176072
Branches 2526 2526
==========================================
- Hits 143195 143185 -10
- Misses 32393 32403 +10
Partials 484 484 ☔ View full report in Codecov by Sentry. |
…r.deserialize() docstring Add indication that LazyFrame.deserialize() and Expr.deserialize() might execute arbitrary code coming from the deserialized data. Fixes pola-rs#14623
88b1a58
to
299f73c
Compare
Updated the PR with:
|
Thanks! |
Add indication that LazyFrame.deserialize() might execute arbitrary code coming from the deserialized data.
Fixes #14623